Root Cause Analysis as a Guide to SRE Methods

نویسنده

  • Timm Grams
چکیده

Which Software Reliability Engineering (SRE) methods should be applied during the various phases of the lifecycle of a product? The answer given here centres on learning from errors. The classification and evaluation of methods is strictly based on causal analyses of disasters, accidents and incidents with undesired outcome. The lifecycle model of IEC Standard 61508 has been adopted as a classification scheme. A couple of examples are given. The SRE methods considered here are those of IEC 61508. These methods are dealing with software reliability as well as with the production of highly reliable software. As an example of some more recent proposals Extreme Programming (XP) has been included. Introduction The rules and the timely processing order of Natural Software Engineering (NSE) I discovered by watching my students during a programming course in November 2003: 1. In the very moment you have a faint idea of what your are supposed to do start coding. This is called realisation. 2. Derive an excerpt from your program. This results in your concept and design. 3. Then write down the specification and define all surprising program properties (heartlessly called bugs) to be features. 4. Convince your customer (the instructor’s role) of what you are able to deliver is what he truly wanted. This challenging task is called requirements engineering. Indeed, this is the natural way: The scheme rests on firm psychological and sociological grounds. For, what your are paid for is real work. Coding is to come first. Whereas activities like haggling over requirements, verification and documentation are introducing delays. And they are real pains. In favour of keeping the time schedule and comfort these activities should be skipped or given low priorities. On the other hand, our experience points into another direction: Disasters happen because of a lack of documentation, badly thought-out specifications, superficial tests and left off verifications. Investigating and analysing the disasters of the past makes the strongest case in favour of the not so natural and painful software engineering methods. This paper deals with software reliability engineering (SRE) methods. They are more or less painful, and they are more or less useful. The question to be answered is: When does it pay to use them? The one who has the task of building and maintaining software of automation and safety systems faces a plentiful variety of techniques for software construction and evaluation. Let alone the IEC Standard 61508 lists 66 techniques aiming at software safety integrity not taken into account all the existing variants of the methods [1, part 7, annex B]. In this situation some guidance is needed. And such will be offered here. This guide comprises three components: 1. A lifecycle model serves as the basic classification scheme of SRE methods and techniques. 2. Primary causes of incidents or accidents will be identified by causal analysis. Preventive measures are identified. 3. The primary causes can be attributed to lifecycle phases, and this will entail a classification of SRE methods suitable for prevention. The resulting classification (or taxonomy) of SRE methods should help starting up SRE activities and to bring into focus the most effective methods and techniques for solving the actual problems (fig. 1). The framework of the following considerations is given by the IEC Standard 61508 applicable to electrical, electronic and programmable electronic (E/E/PES) safety-related systems [1]. For these systems a safety plan shall be prepared at the outset and this plan shall be updated during the entire safety lifecycle. This safety plan shall specify “procedures which ensure that hazardous incidents or incidents with potential to create hazards are analysed and recommendations made such that the probability of repeat occurrence is minimised” [1, part 1, 6.2.2k]. This imperative says how the learning from errors shall be institutionalised, and it also draws the lines along which this paper will evolve.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Root Cause and Error Analysis

Error is an inevitable part of life and cannot be completely eliminated, but it can be minimized. A root cause analysis is a technique for understanding the systematic error causes that is involved beyond a person or people to implement an errors and including field and environmental causes of errors when occur in this situation too. An important factor of an error occurrence is a root cause (c...

متن کامل

Guiding reengineering with the operational profile

Results of applied Sofhuare Reliability Engineering (SRE) processes and tools on two teleconference support systems are documented. Beginning with the initial evaluation of SRE technology for AT&T’s Teleconference Service Development in 1992, this presentation covers the eventual use of SRE in many phases of software development for two separate systems. Introduction of SRE into the system test...

متن کامل

A Practical Guideline Fora Successful Root Cause Failureanalysis

Root cause failure analysis is a process for identifying the true root cause of a particular failure and using that information to set a course for corrective/preventive action. From a technical standpoint, it is usually a multidisciplinary problem, typically focused on the traditional engineering fields such as chemistry, physics, materials, statics, dynamics, fluids, etc. However, it seems th...

متن کامل

Contribution of Indirect Causes to Maternal Mortalities Based on a Methodological Approach to Clinical Epidemiology in Iran

Introduction: Level of mothers’ literacy, pregnancy history of more than four times, residence in villages, lack of receiving intensive care during pregnancy, as well as inaccessibility to obstetric emergency services have been reported, in Iran and the world, as major factors for maternal mortality. Considering significance of identifying indirect causes of maternal mortalitie...

متن کامل

A Root Cause Analysis of Haemolysis Encountered in Leuco-filtration of Stored Packed Red Cells Units

Background and Aims: Leukoreduction of blood components has reduced the incidence of transfusion-associated adverse events. Leucofiltration is the most effective method of leukoreduction. We encountered haemolysis in a series of leucofiltered units. This stressed our precious inventory, added to financial loss, increased our turn-around time to issue leucofiltered blood units, and placed doubts...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004